2020-08-03

Onchocerciasis

  • Disease caused by a filarial nematode Onchocerca voluvlus
  • It is common in tropical and subtropical areas of Africa and some parts of South America.
  • At least 17 million people are infected globally and 198 milllion are at the risk of infection (NTD Modelling Consortium Onchocerciasis Group, 2019).
    Global burden and clinical manifestations

    Global burden and clinical manifestations

Life cycle of O. volvulus

  • It needs two different host to complete it’s life cycle: humans and black flies (Simulium sp.)
    Life cycle of _O. volvulus_[@http://zotero.org/users/2873801/items/YB2QJQP7]

    Life cycle of O. volvulus(Basáñez et al., 2016)

  • Ivermectin is the only drug used for treatment.

Modeling in onchocerciasis

  • Because of the complexity of life cycle, inability of O. volvulus to be grown in vitro, mathematical models are important to study the parasite’s biology and the disease epidemiology (Basáñez et al., 2016).
    Modellling study showing the effect of frequency of ivermectin treatment on microfilarial prevalence [@http://zotero.org/users/2873801/items/J94E2TLU]

    Modellling study showing the effect of frequency of ivermectin treatment on microfilarial prevalence (Hamley et al., 2019)

Rationale: Why geospatial model?

  • Recent epidemiological mapping study shows onchocerciasis prevalence across Africa is heterogeneous and patchy.
    Onchocerciasis prevalence map [@http://zotero.org/users/2873801/items/KLY4PY5R]

    Onchocerciasis prevalence map (Zouré et al., 2014)

  • With onchocerciasis control progressing towards elimination, geospatially explicit models are more important.

Project Aims

  • Aim 1: To develop geo-spatial modeling framework for analysis of onchocerciasis prevalence
    • Identify different types of data needed for the analysis
    • Determine different ecological, socio-demographic factors driving onchocerciasis epidemiology
  • Aim 2: To investigate methods for incorporating vector and parasite genetic data in geospatial models
    • Determine ecological factors affecting vector and parasite population structure
    • Infer migration pattern and dispersal of vector populations using landscape genetics analysis
  • Aim 3: Modeling different scenarios like effect of drug intervention and vector control at different geospatial scale

Expectations

  • An updated spatio-temporal prevalence map for Ethiopia and other African regions depending on data availability.
  • Identification of ecological factors driving the vector and parasite population distribution, and thus, also onchocerciasis prevalence.
  • A method to incorporate genetic data into geospatially explicit model for onchocerciasis.
  • A tool to monitor and formulate strategies for onchocerciasis elimination campaign.

Project progress

Aim 1

Geospatial modelling framework for prevalence data

  • Identified sources of data needed
    • Prevalence data (systematic literature search, relevant public health institutes)
    • Climate and environmental data (Worldclim, SEDAC, NOAA, satellite data repository)
    • Genetic data (lab repository)
  • Explored two different geospatial modeling framework for prevalence data
    • Machine learning approach: Random forest algorithm
    • Bayesian approach: Integrated Nested Laplace Approximation (INLA)

Data sources for the prototype geospatial model

  • Ethiopian prevalence data from publicly available database (Hill et al., 2019).

Onchocerciasis prevalence data from Ethiopia used for analysis

Climate and socio-demographic covariates

Raster layer of some of the covariates masked to the border of Ethiopia

Raster layer of some of the covariates masked to the border of Ethiopia

  • A stack of 33 different covariates were explored

Selection of covariates

  • Hierarchical clustering algorithm was used to select most representative covariates
    Dendrogram from the clustering analysis showing different cluster of covariates

    Dendrogram from the clustering analysis showing different cluster of covariates

  • List of 5, 10 and 15 cluster of covariates were generated
  • Potential influence of covariates (distance to river, rural urban extent) on onchocerciasis prevalence was also considered

1. Random Forest Model

  • Spatial dependency on data accounted by incorporating buffer distances to the sample locations.
  • Model selection with k-fold cross validation approach.
    Five fold cross validation for model validation and selection

    Five fold cross validation for model validation and selection

  • Root mean square error (RMSE) and R-squared values were calculated for each model

Random Forest Model selection

Prevalence prediction with Random Forest Model

Predicted median prevalence with Random Forest Model

Random Forest Model: Prediction error

  • Prediction error was calculated from the upper and lower limit of predicted prevalence

    The prediction error is higher in the locations where predicted prevalence is higher

2. Bayesian Approach: INLA

  • Allows to incorporate prior knowledge about the parameter in the form of probability distribution
  • Number of cases (\(Y_i\)) observed out of the total number of people tested (\(N_i\)) were assumed to follow binomial distribution \[ Y_i|P(\boldsymbol{x}_i) \sim Binomial(N_i, P(\boldsymbol{x}_i)) \]
  • Log odds of prevalence was modeled as \[ logit(P(\boldsymbol{x}_i)) = \beta_0 + \mathbf{X_i}^\intercal \mathbf{\beta} + S(\boldsymbol{x}_i). \]
  • \(S(\cdot)\) is a spatial random effect with Matérn covariance function.

Prevalence prediction with INLA Model

Mean prevalence map generated from the INLA model

  • The Great Rift valley appears to be the major geographical barrier influencing onchocerciasis epidemiology

INLA Model: Prediction error

Areas with ground truth data has lesser prediction error

Comparison of predictions between Random forest model and INLA model

  • Correlation between the predicted and observed was better for Random forest model (97%) compared to the INLA model (89%).
Scatter plot for the observed and predicted prevalence for the Random forest and the INLA model

Scatter plot for the observed and predicted prevalence for the Random forest and the INLA model

Next steps

  • Collate prevalence data at a greater spatial and temporal coverage.
  • Prepare additional covariates reflecting information about river flow, temporal covariates on climate and socio-demographic data.
Example spatio-temporal map at different time slices

Example spatio-temporal map at different time slices

Next steps (contd.)

  • Estimating epidemiologically relevant parameters from parasite genetic data with landscape genetic analysis.
    • Create a connectivity and resistance surface map which might provide insight about the migration patterns of vector populations.
    • Identify environmental factors affecting their population structure of vectors and parasites.
  • Expanding the current empirical geospatial model to a dynamic model which will provide greater flexibility to model different intervention scenarios.

Gantt chart

Timeline for the project

Timeline for the project

Acknowledgement

  • Assoc. Prof. Warwick Grant
  • Dr. Shannon Hedtke
  • Dr. Karen McCulloch
  • Dr. Joel Miller
  • Dr. Rebecca Chisholm
  • The Grant Lab members

References

Basáñez, M. G., Walker, M., Turner, H. C., Coffeng, L. E., de Vlas, S. J., & Stolk, W. A. (2016). River Blindness: Mathematical Models for Control and Elimination. Advances in Parasitology, 94, 247–341. https://doi.org/10.1016/bs.apar.2016.08.003

Hamley, J. I. D., Milton, P., Walker, M., & Basáñez, M.-G. (2019). Modelling exposure heterogeneity and density dependence in onchocerciasis using a novel individual-based transmission model, EPIONCHO-IBM: Implications for elimination and data needs. PLOS Neglected Tropical Diseases, 13(12), e0007557. https://doi.org/10.1371/journal.pntd.0007557

Hill, E., Hall, J., Letourneau, I. D., Donkers, K., Shirude, S., Pigott, D. M., Hay, S. I., & Cromwell, E. A. (2019). A database of geopositioned onchocerciasis prevalence data. Scientific Data, 6(1), 67. https://doi.org/10.1038/s41597-019-0079-5

NTD Modelling Consortium Onchocerciasis Group. (2019). The World Health Organization 2030 goals for onchocerciasis: Insights and perspectives from mathematical modelling. Gates Open Research, 3, 1545. https://doi.org/10.12688/gatesopenres.13067.1

Zouré, H. G., Noma, M., Tekle, A. H., Amazigo, U. V., Diggle, P. J., Giorgi, E., & Remme, J. H. (2014). The geographic distribution of onchocerciasis in the 20 participating countries of the African Programme for Onchocerciasis Control: (2) pre-control endemicity levels and estimated number infected. Parasites & Vectors, 7(1), 326. https://doi.org/10.1186/1756-3305-7-326

Thank you

Effect of covariates

  • Importance of covariates can be assessed with the variable importance plot
    Variable importance plot for covariates in the random forest model

    Variable importance plot for covariates in the random forest model

Effect of covariates (contd)

  • Linear regression analysis was done to assess relationship between covariates and the predicted prevalence
    Linear regression model for covariates and the predicted prevalence

    Linear regression model for covariates and the predicted prevalence

Posterior probability distribution of effect parameter of covariates

Posterior probability distribution of effect parameter of covariates

Posterior probability distribution of effect parameter of covariates